Developing an Online Indonesian Corpora Repository
نویسندگان
چکیده
This paper describes efforts to develop an online repository of Indonesian corpora –and its associated functions and services– that has been designed to support a wide variety of use cases and applications. Two design considerations are ensuring sustainability and accessibility of the corpora, and enabling open enrichment through annotation. The presented model supports OLAC-compliant metadata, is built atop an OAIS-compliant core repository, and exposes data and functionality via RESTful web services. A prototype implementation is presented, which allows users to upload, browse, and search the collection, whose extensible content model currently supports POS tagging. The future plan is for language-independent aspects of the system to be packaged up and released as an open-source package to aid the development of corpora repositories for other languages.
منابع مشابه
LEarning and TEaching corpora (LETEC): data-sharing and repository for research on multimodal interactions
The number of online environments language teachers can employ is constantly growing, offering increased potential for multimodal L2 interaction analysis. This paper introduces the LEarning and TEaching Corpora (LETEC) methodology that links, following international standards, all elements resulting from an online learning situation, whose context is described by a pedagogical scenario and a re...
متن کاملA Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization
In this paper we report our effort to construct the first ever Indonesian corpora for chat summarization. Specifically, we utilized documents of multi-participant chat from a well known online instant messaging application, WhatsApp. We construct the gold standard by asking three native speakers to manually summarize 300 chat sections (152 of them contain images). As result, three reference sum...
متن کاملDeveloping Parallel Sense-tagged Corpora with Wordnets
Semantically annotated corpora play an important role in natural language processing. This paper presents the results of a pilot study on building a sense-tagged parallel corpus, part of ongoing construction of aligned corpora for four languages (English, Chinese, Japanese, and Indonesian) in four domains (story, essay, news, and tourism) from the NTU-Multilingual Corpus. Each subcorpus is firs...
متن کاملIndonesian Perspective on Massive Open Online Courses: Opportunities and Challenges
There are two indications that Indonesia needs to improve its education quality. The first is the Human Development Index (HDI), which is still at the medium level, and the second is the enrollment rate in higher education, which is also at the low level. MOOCs have the potential to solve both problems. However, implementing MOOCs in a developing country needs a specific analysis to determine t...
متن کاملThe Barriers of the Indonesian Extension Workers in Disseminate Agricultural Information to Farmers
117302-4747 IJBAS-IJENS © April 2011 IJENS I J E N S Abstract—Agriculture plays an important role in both poverty reduction and economic growth. The technology of agriculture in the developing world should change continuously to keep pace with rising populations and rapidly changing social, economic, and environmental conditions. The role of agricultural extension services to disseminate approp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010